Pilot and data signals for MIMO systems using channel statistics

ABSTRACT

A method generates signals in a transmitter of a multiple-input, multiple-output wireless communications system. The transmitter includes N t  transmit antennas. A transmit covariance matrix R t  determined using statistical state information of a channel. The transmit covariance R t  matrix is decomposed using transmit eigenvalues Λ t  to obtain a transmit eigenspace U t  according to R t =U t Λ t U †   t , where † is a Hermitian transpose. A pilot eigenspace U p  is set equal to the transmit eigenspace U t . A N t ×T p  block of pilot symbols X p  is generated from the pilot eigenspace U p  and pilot eigenvalue Λ p  according to X p =U p Λ p   1/2 . A data eigenspace U d  is set equal to the transmit eigenspace U t . In addition, a N t ×N t  data covariance matrix Q d  is generated according to U d Λ d U †   d , where Λ d  are data eigenvalues. A N t ×T d  block of data symbols is generated, such that an average covariance of each of the columns in the block of data symbols X d  equals the data covariance matrix Q d . The block of pilot and data symbols form the signals to be transmitted.

FIELD OF THE INVENTION

This invention relates generally to multiple transmit antenna systems, and more particularly to determining pilot and data signals for such systems.

BACKGROUND OF THE INVENTION

Multiple-input, multiple-output (MIMO) communications can significantly increase spectral efficiencies of wireless systems. Under idealized conditions, a capacity of the channel increases linearly with the number of transmit and receive antennas, Winters, “On the capacity of radio communication systems with diversity in a Rayleigh fading environment,” IEEE Trans. Commun., vol. 5, pp. 871-878, June 1987, Foschini et al., “On the limits of wireless communications in a fading environment when using multiple antennas,” Wireless Pers. Commun., vol. 6, pp. 311-335, 1998, and Telatar, “Capacity of multi-antenna Gaussian channels,” European Trans. Telecommun., vol. 10, pp. 585-595, 1999.

The possibility of high data rates has spurred work on the capacity achievable by MIMO systems under various assumptions about the channel, the transmitter and the receiver. The spatial channel model and assumptions about the channel state information (CSI) at the transmitter (CSIT) and the receiver (CSIR) have a significant impact on the MIMO capacity, Goldsmith et al., “Capacity limits of MIMO channels,” IEEE J. Select. Areas Commun., vol. 21, pp. 684-702, June 2003.

For most systems, the instantaneous CSIT is not available. For frequency division duplex (FDD) systems, in which forward and reverse links operate at different frequencies, instantaneous CSIT requires a fast feedback, which decreases spectral efficiency. For time division duplex (TDD) systems, in which the forward and reverse links operate at the same frequency, the use of the instantaneous CSIT is impractical in channels with small coherence intervals because the delays between the two links need to be very small to ensure that the CSIT, inferred from transmissions by the receiver, is not outdated by the time it is used.

These problems can be avoided by using covariance knowledge at the transmitter (CovKT). This is because small-scale-averaged statistics, such as covariance, are determined by parameters, such as angular spread, and mean angles of signal arrival. The parameters remain substantially constant for both of the links even in FDD or quickly-varying TDD systems. Therefore, such statistics can be directly inferred at the transmitter by looking at reverse link transmissions without the need for explicit feedback from the receiver. In cases where feedback from the receiver is available, such feedback can be done at a significantly slower rate and bandwidth given the slowly-varying nature of the statistics.

The use of covariance knowledge at the transmitter to optimize the transmitted data sequences, assuming an idealized receiver with perfect CSIR, has been described by Visotsky et al., “Space-time transmit precoding with imperfect feedback,” IEEE Trans. Inform. Theory, vol. 47, pp. 2632-2639, September 2001, Kermoal et al., “A stochastic MIMO radio channel model with experimental validation,” IEEE J. Select. Areas Commun., pp. 1211-1226, 2002, Jafar et al., “Multiple-antenna capacity in correlated Rayleigh fading with channel covariance information,” to appear in IEEE Trans. Wireless Commun., 2004, Simon et al., “Optimizing MIMO antenna systems with channel covariance feedback,” IEEE J. Select. Areas Commun., vol. 21, pp. 406-417, April 2003, Jorswieck et al., “Optimal transmission with imperfect channel state information at the transmit antenna array,” Wireless Pers. Commun., pp. 33-56, October 2003, and Tulino et al., “Capacity of antenna arrays with space, polarization and pattern diversity,” in ITW, pp. 324-327, 2003. Jul. 12, 2004.

However, in practical applications, the CSIR is imperfect due to noise during channel estimation.

MIMO capacity with imperfect CSIR is described for different system architectures, channel assumptions and estimation error models. Many theoretical systems have been designed for spatially uncorrelated (‘white’) channels. While these theoretical solutions give valuable insights, they do not correspond to the physical reality of most practical MIMO channels, Molisch et al., “Multipath propagation models for broadband wireless systems,” Digital Signal Processing for Wireless Communications Handbook, M. Ibnkahla (ed.), CRC Press, 2004. In practical applications, the channel is often correlated spatially (‘colored’), and the various transfer functions from the transmit antennas to the receive antennas do not change independent of each other.

For the case where the CSIT is not available and MMSE channel estimation is used at the receiver, pilot-aided channel estimation for a block fading wireless channel has been described by Hassibi et al., “How much training is needed in multiple-antenna wireless links?,” IEEE Trans. Inform. Theory, pp. 951-963, 2003. They derive an optimal training sequence, training duration, and data and pilot power allocation ratio.

The problems with a mismatched closed-loop system have also been described, Samardzija et al., “Pilot-assisted estimation of MIMO fading channel response and achievable data rates,” IEEE Trans. Sig. Proc., pp. 2882-2890, 2003 and Yoo et al., “Capacity of fading MIMO channels with channel estimation error,” Allerton, 2002. A data-aided coherent coded modulation scheme with a perfect interleaver is described by Baltersee et al., “Achievable rate of MIMO channels with data-aided channel estimation and perfect interleaving,” IEEE Trans. Commun., pp. 2358-2368, 2001.

Baltersee et al., analyze the achievable rate of a data-aided coherent coded modulation scheme with a perfect interleaver. Mutual information bounds for vector channels with imperfect CSIR are described by Medard, “The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel,” IEEE Trans. Inform. Theory, pp. 933-946, 2000.

Others, in different contexts, state that orthogonal pilots are optimal, Guey et al., “Signal design for transmitter diversity wireless communication systems over Rayleigh fading channels,” IEEE Trans. Commun., vol. 47, pp. 527-537, April 1999, and Marzetta, “BLAST training: Estimating channel characteristics for high-capacity space-time wireless,” Proc. 37th Annual Allerton Conf. Commun., Control, and Computing, 1999.

Data covariance for spatially correlated channels, given imperfect CSIR, are described by Yoo et al., “MIMO capacity with channel uncertainty: Does feedback help?,” submitted to Globecom, 2004. However, the imperfect channel estimation was modeled in an ad-hoc manner by adding white noise to the spatially white component of channel state. Therefore, that model is inappropriate for many applications.

Lower and upper bounds on capacity are described for spatially white channels, Marzetta et al., “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,” IEEE Trans. Inform. Theory, vol. 45, pp. 139-157, January 1999. Those systems do not assume any a priori training schemes for generating the CSIR, and serve as fundamental limits on capacity.

Prior art systems either do not exploit statistics knowledge completely to determine the pilot and data sequences, or either design only pilot or only data signals, but not both, and make idealized assumptions about the channel knowledge at the transmitter and/or the receiver.

In light of the problems with the prior art MIMO systems, it is desired to generate optimal pilot and data signals, even when the instantaneous and perfect channel state is unavailable at the transmitter and receiver.

SUMMARY OF THE INVENTION

The invention provides a method for generating pilot and data signals in a multiple-input, multiple-output (MIMO) communications system. In the system, the transmitter only has access to channel covariance statistics, while the receiver has access to instantaneous, albeit, imperfect channel state information (CSIR). The receiver can estimate the channel using a minimum mean square error estimator. No specific assumptions are made about the spatio-temporal processing of the signals at the receiver that determines what data are transmitted.

It is a goal of the invention to fully exploit covariance knowledge at the transmitter to generate optimal pilot and data signals that enhance data transmission rates achievable over wireless channels. The invention matches eigenspaces of the pilot and data signals to the eigenspace of the transmitter side covariance of the channel. The invention also makes the ranks of the pilot and data covariance matrices equal. The rank determines how many of the stationary eigenmodes of the matrix are used. Thereby, rank matching ensures that pilot power is not wasted on eigenmodes that are not used for data transmission and vice versa. Furthermore, the duration of training with pilot signals, in units of symbol durations, is equal to the rank. For example, if the rank is three, then the training duration is three pilot symbols long.

The invention can also assign powers to the different eigenmodes using numerical methods. Furthermore, the invention uses a simple uniform assignment of power to the pilot and data signals, which results in near-optimal performance. The invention also describes a relationship between the powers of the corresponding pilot and data eigenmodes that can simplify the complexity of the above numerical methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a transmitter according to the invention; and

FIG. 2 is a flow diagram of a method for generating pilot and data signals according to the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

System Structure

FIG. 1 is a transmitter 100 according to our invention for a multiple-input, multiple-output wireless communications system. The transmitter transmits a block of symbols 101 having a total duration T and a total power P. The block 101 includes pilot signals 102 having a duration T_(p) and power P_(p), and data signals 103 having a duration T_(d) and power P_(d), such that T=T_(p)+T_(d), and P=P_(d)+P_(d). In the block 101, each row corresponds to one of the N_(t) transmit antennas.

The transmitter 100 includes multiple (N_(t)) antennas 105 for transmitting the pilot and data signals 101. The system 100 includes means 110 for determining statistical channel state information (SCSI) 111. By statistical information, we mean that we do not know the instantaneous state of the channel at that time the signals 101 are transmitted, which would be ideal. Instead, we only know how the state behaves statistically, when observed over a relatively long duration of time.

The statistics can be determined directly or indirectly. In the direct mode, a receiver 150 communicating with the transmitter supplies the SCSI in feedback messages 108 in response to transmitted signals, in a so called ‘closed-loop’ architecture. In the indirect mode, the SCSI is derived from signals 109 transmitted by the receiver 150, from time to time. The statistics are in the form of covariance matrixes described in greater detail below.

We use the SCSI to generate 120 the pilot signals and to generate 130 the data signals. More specifically, we use the SCSI to determine the signals to be transmitted, the duration T_(p) for the pilot signals, and the power P allocated to the transmitted signals.

System Operation

As shown in FIG. 2, the method 200 in the transmitter 100 determines 210 statistical channel state information directly from the feedback 108, or indirectly from reverse link transmissions 109. The SCSI is expressed in terms of a transmit covariance matrix R_(t). Because our invention is independent of the receive covariance matrix, we set a receive covariance matrix equal R_(r) to an identity matrix of size N_(r)×N_(r).

An eigen decomposition 220 is performed on the transmit covariance matrix R_(t), using transmit eigenvalue Λ_(t) to obtain a transmit eigenspace U_(t) and its Hermetian transpose U^(\) _(t).

In the transmitter, pilot eigenvalues Λ_(p) 229 for the pilot signals and data eigenvalues Λ_(d) 239 for the data signals are determined. The eigenvalues are strictly based on the signal duration T and power P allocated to signal 101 to be transmitted. The eigenvalues can be determined beforehand using numerical search techniques or using near-optimal loading techniques that we describe below.

Using the result of the eigen decomposition 220, in step 230, a pilot eigenspace U_(p) is set equal to the transmit eigenspace U_(t). The pilot eigenspace U_(p) and the pilot eigenvalue Λ_(p) are used to generate the N_(t)×T_(p) block 102 of the pilot signals according to X_(p)=U_(p)Λ_(p) ^(1/2). In general, X_(p) can also have an arbitrary right eigenspace V_(p), thereby taking the general form X_(p)=U_(p)Λ_(p) ^(1/2)V_(p) ^(†).

In step 240, the data eigenspace U_(d) is set equal to the transmit eigenspace U_(t). The data eigenspace U_(t) and the data eigenvalue Λ_(d) are used to generate a N_(t)×N_(t) data covariance matrix Q_(d)=U_(d)Λ_(d)U^(†) _(d).

The result from step 240 is used in step 250 to generate the N_(t)×T_(d) block 103 of data signals, such that the covariances of all of the columns E[x_(i) x_(i) ^(†)] in the data symbol block are equal to the data covariance matrix Q_(d), for 1≦i≦T_(d).

The pilot symbol block and the data symbol block are combined in step 260 so that the N_(t)×T block for the signals 101 is X=[X_(p), X_(d)]. The N_(t) rows of the matrix X are fed to the N_(t) antennas 105, row-by-row.

The detail of the transmitter structure and operation are now described in greater detail.

MIMO Channel Model

We consider a MIMO system with N_(t) transmit antennas and N_(r) receive antennas operating on a block fading frequency-flat channel model in which the channel remains constant for T time instants, and decorrelates thereafter. Each time instant is one symbol long. Of the T time instants, T_(p) are used for transmitting pilot signals (pilot symbols), and the remaining T_(d)=T−T_(p) time instants are used for data signals (data symbols). We use the subscripts p and d for symbols related to pilot and data signals, respectively. P_(p) and P_(d) denote the power allocated to pilot and data signals, respectively. Lower and upper case boldface letters denote vectors and matrices, respectively.

An N_(r)×N_(t) matrix H denotes an instantaneous channel state, where h_(ij) denotes a complex fading gain from transmit antenna j to receive antenna i. Many channels can be represented by a covariance matrix expressed as a Kronecker product of the transmit and receive covariance matrices. The matrix H is H=R _(r) ^(1/2) H _(w) R _(t) ^(1/2),  (1) where R_(t) and R_(r) are the transmit and receive covariance matrices, respectively. The matrix H_(w) is spatially uncorrelated, i.e., entries in the matrix are zero-mean, independent, complex Gaussian random variables (RVs) with unit variance. Furthermore, we assume that R_(r)=I_(Nr), which is fulfilled when the receiver is in a rich scattering environment, e.g., the downlink of a cellular system or a wireless LAN system from an access point to a receiver. The receive covariance matrix R_(t) is full rank.

Training Phase with Pilot Signals

A signal received during a training phase of duration of time T_(p) is an N_(r)×T_(p) matrix Y_(p)=[y_(ij)], where and entry y_(ij) is the signal received at receive antenna i at time instant j. The matrix Y_(p) is given by Y _(p) =HX _(p) +W _(p),  (2) where X_(p)=[x_(ij)] is the transmitted pilot matrix 102 of size N_(t)×T_(p), which is known at the receiver. Here, x_(ij) is the signal transmitted from transmit antenna i at time j. A spatially and temporally white noise matrix W_(p) is defined in a similar manner. The entries of the matrix W_(p) have variance σ_(w) ².

Data Transmission

The noise vectors at different time instants are independent and identically distributed. Therefore, considering the capacity for block transmissions is equivalent to optimizing the capacity for vector transmissions. For any given time instant, the received vector, y_(d), is related to the transmitted signal vector, x_(d), by y _(d) =Hx _(d) +w _(d),  (3) where w_(d) is the spatially white noise vector. The vectors y_(d), x_(d), and w_(d) have dimensions N_(r)×1, N_(t)×1, and N_(r)×1, respectively.

Other Notation

A parameter

Γ₁|Γ₂ denotes an expectation over RVΓ₁ given Γ₂, where (.)^(†) is the Hermitian transpose, (.)^(T) is the transpose, (.)^((k)) is the k×k principal sub-matrix that includes the first k rows and columns, Tr{.} is the trace, |.| is the determinant, and I_(n) denotes the n×n identity matrix.

Q_(d)=

_(xd) [x_(d)x^(†) _(d)] and Q_(p)=X_(p)X^(†) _(p) denote the data signal and pilot signal covariance matrices, respectively. Given that the pilot signal X_(p) is a deterministic matrix, no expectation operator is used for defining Q_(p).

Eigen decompositions of Q_(d), Q_(p), and R_(t) are Q_(d)=U^(d)Λ_(d)U^(†) _(d), Q_(p)=U_(p)Λ_(p)U^(†) _(p), and R_(t)=U_(t)Λ_(t)U†_(t), and the SVD of X_(p) is X_(p)=U_(p)Σ_(p)V^(†) _(p). Note that Q_(d), Q_(p), and R_(t) are all Hermitian matrices, i.e., they equal their Hermitian transposes. Also, Λ_(p)=Σ_(p)Σ_(p) ^(†).

MMSE Channel Estimator

Given the covariance information and the pilot signals X_(p), the MMSE channel estimator passes the received vector y_(d) through a deterministic matrix filter to generate the channel estimate Ĥ. For R_(r)=I_(Nr), it can be shown that Ĥ=Y _(p)(

_(Y) _(p) _(|X) _(p) [Y _(p) ^(†) Y _(p)])⁻¹

_(H,Y) _(p) _(|X) _(p) [Y _(p) ^(†) H].  (4) Substituting equation (2) into equation (4), and simplifying the results gives Ĥ=Y _(p) A=(HX _(p) +W _(p))A, where the matrix filter A is given by A=(X _(p) ^(†)R_(t) X _(p)+σ_(w) ² I _(T) _(p) )⁻¹ X _(p) ^(†) R _(t).  (5)

As shown in Appendix A, Ĥ is statistically equivalent to Ĥ=Ĥ_(w){circumflex over (R)}^(1/2) _(t),  (6) where Ĥ_(w) is spatially white with its entries having a unit variance. {circumflex over (R)}_(t) is given by {circumflex over (R)} _(t) =R _(t) X _(p)(X _(p) ^(†) R _(t) X _(p)+σ_(w) ² I _(T) _(p) )⁻¹ X _(p) ^(†) R _(t).  (7) The above result shows that in general {circumflex over (R)}_(t)≠R_(t), i.e., for an MMSE estimator, the estimation error also affects the transmit antenna covariance of the estimated channel Ĥ, and cannot be modeled by mere addition of a spatially white noise to H_(w), as is done in the prior art.

Capacity with Estimation Error

A channel estimation error is defined as Δ=H−Ĥ. From equation (3), it follows that data transmission is governed by y _(d) =Ĥx _(d) +Δx _(d) +w _(d).  (8)

A lower bound of the capacity of the channel is obtained by considering a sub-optimal receiver that treats a term e=Δx_(d)+w_(d) as Gaussian noise. The channel capacity is therefore lower bounded by $\begin{matrix} {C_{\Delta} = {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}{{{I_{N_{t}} + {{{\hat{H}}^{\dagger}\left( {{\mathbb{E}}_{e}\left\lbrack {ee}^{\dagger} \right\rbrack} \right)}^{- 1}\hat{H}Q_{d}}}}.}}} & (9) \end{matrix}$ The factor (1−T_(p)/T) is a training penalty resulting from pilot transmissions, which transfer no information. Equation (6) implies that a distribution of Ĥ is left rotationally invariant, i.e., π(ΘĤ)=π(Ĥ), where π(.) denotes a probability distribution function, and Θ is any unitary matrix. It therefore follows that C_(Δ) is lower bounded further by $\begin{matrix} {{{{C_{\Delta} \geq C_{L}} = {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}{{I_{N_{t}} + {\frac{1}{\sigma_{w}^{2} + \sigma_{l}^{2}}{\hat{H}}^{\dagger}\hat{H}Q_{d}}}}}},{where}}{\sigma_{l}^{2} = {\frac{1}{N_{r}}{Tr}{\left\{ {{\mathbb{E}}_{\Delta,x_{d}}\left\lbrack {\Delta\quad x_{d}x_{d}^{\dagger}\Delta^{\dagger}} \right\rbrack} \right\}.}}}} & (10) \end{matrix}$

As shown in Appendix B, σ_(l) ² reduces to σ_(l) ² =Tr{Q _(d)(R _(t) −{tilde over (R)} _(t))}.

Optimal Pilot and Data Signals

We now desire to maximize the lower bound on the MIMO capacity C_(L) with imperfect knowledge of the exact channel state information. This maximization problem can be stated as: $\begin{matrix} {{\max\limits_{\substack{U_{d},A_{d},X_{p} \\ T_{p}}}{\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}{{I_{N_{t}} + \frac{{\hat{H}}^{\dagger}\hat{H}Q_{d}}{\sigma_{w}^{2} + {{Tr}\left\{ {Q_{d}\left( {R_{t} - {\overset{\sim}{R}}_{t}} \right)} \right\}}}}}}},} & (12) \end{matrix}$ subject to a total power/time constraint P_(p)T_(p)+P_(d)T_(d)=PT, where

-   -   Tr{Q_(d)}=P_(d), Tr{X_(p)X^(†) _(p)}=P_(p)T_(p), and P is the         total power.

We first state the following lemma.

Lemma 1:

If the matrices AB and BA are positive semi-definite, there always exists a permutation τ such that Tr{AB}=Tr{BA}=Σ_(i)σ_(i)(A)σ_(τ(i))(B), where σ_(i)(.) denotes the i^(th) eigenvalue. The following theorem deals with just the self-interference term σ_(l) ².

Theorem 1: $\begin{matrix} {{{\min\limits_{U_{p},U_{d}}\sigma_{l}^{2}} = {{\min\limits_{U_{p},U_{d}}{{Tr}\left\{ {Q_{d}\left( {R_{t} - {\overset{\sim}{R}}_{t}} \right)} \right\}}} = {\sigma_{w}^{2}{\sum\limits_{i = 1}^{k_{p}}\quad\frac{\lambda_{d_{i}}\lambda_{t_{i}}}{\sigma_{w}^{2} + {\lambda_{t_{i}}\lambda_{p_{i}}}}}}}},{{{where}\quad\lambda_{t_{1}}} \geq \lambda_{t_{2}} \geq \ldots}\quad,{\lambda_{d_{1}} \geq {\lambda_{d_{2}}\quad\ldots}}\quad,{{{and}\quad\lambda_{p_{1}}} \geq {\lambda_{p_{2}}\quad\ldots}}} & (13) \end{matrix}$

Proof: In a sequence of inequalities that follow, we first arrive at a lower bound for σ_(l) ², without commenting at each step, on the conditions required to achieve equality. At the very end, we show that equality is indeed achievable. Let k_(p) denote the rank of the pilot symbol matrix X_(p). First, we define the following matrices: ${S_{3} = {{U^{\dagger}\begin{bmatrix} \left( {\left( {U\quad\Lambda_{t}Y^{\dagger}} \right)^{(k_{p})} + {\sigma_{w}^{2}\Lambda_{p}^{{(k_{p})}^{- 1}}}} \right)^{- 1} & 0 \\ 0 & 0 \end{bmatrix}}U}},{S_{2} = {\Lambda_{t}\left( {I_{N_{t}} - {S_{3}\Lambda_{t}}} \right)}},{{{and}\quad S_{1}} = {{VS}_{2}V^{\dagger}}},$ where U=U^(†) _(p)U_(t) and V=U^(†) _(d)U_(t). As shown in Appendix C, σ_(l) ²=Tr{Λ_(d)S₁}. Therefore, $\begin{matrix} {{\min\limits_{U_{p},U_{d}}\sigma_{l}^{2}} = {{\min\limits_{U,V}{{Tr}\left\{ {\Lambda_{d}S_{1}} \right\}}} = {\min\limits_{\tau_{1},U}{\sum\limits_{i}\quad{{\sigma_{i}\left( \Lambda_{d} \right)}{{\sigma_{\tau_{1}{(i)}}\left( S_{1} \right)}.}}}}}} & (14) \end{matrix}$

Given that S₁ and S₂ have the same eigenvalues, there exists a permutation τ₂ such that σ_(i, i)(S₁)=στ₂(i) (S₂). The following step eliminates V. $\begin{matrix} {{\min\quad{Tr}\left\{ {\Lambda_{d}S_{1}} \right\}} = {{\min\limits_{\tau_{3} = {\tau_{2}o\quad\tau_{1}}}{\sum\limits_{i}\quad{{\sigma_{i}\left( \Lambda_{d} \right)}{\sigma_{\tau_{3}{(i)}}\left( S_{2} \right)}}}} = {\min\quad{Tr}{\left\{ {\Lambda_{d}S_{2}} \right\}.}}}} & (15) \end{matrix}$

Simplifying further,

-   minTr{Λ_(d)S₁}=minTr{Λ_(d)S₂}=Tr{Λ_(d)Λ_(t)}−max(Tr{Λ_(d)Λ² _(t)S₃}.     We only need to maximize Tr{Λ² _(t)Λ_(d)S₃}. We define     $U = {{\begin{bmatrix}     U^{(k_{p})} & D \\     E & F     \end{bmatrix}\quad{and}\quad\Lambda_{t}} = {\begin{bmatrix}     \Lambda_{t}^{(k_{p})} & 0 \\     0 & \Lambda_{t}^{({rest})}     \end{bmatrix}.}}$

As shown in Appendix D, Tr{Λ _(d)Λ_(t) ² S ₃ }≦Tr{S ₄ ⁻¹},  (16) where S ₄=Λ_(d) ^((k) ^(p) ⁾ ⁻¹ Λ_(t) ^((k) ^(p) ⁾ ⁻¹ (I _(k) _(p) +σ_(w) ² U ^((k) ^(p) ⁾ ⁻¹ Λ_(p) ^((k) ^(p) ⁾ ⁻¹ U ^((k) ^(p) ⁾ ⁻¹ Λ_(t) ^((k) ^(p) ⁾ ⁻¹ ),  (17) and equality occurs when U^((kp)) is unitary. Note that S₄ is independent of D.

Using matrix algebra, we know that $\frac{{\partial{Tr}}\left\{ S_{4}^{- 1} \right\}}{\partial U^{(k_{p})}} = {S_{4}^{- 2}{\frac{{\partial{Tr}}\left\{ S_{4} \right\}}{\partial U^{(k_{p})}}.}}$

Given that S₄ is invertible, this implies that the extrema of Tr{S₄} and Tr{S₄ ⁻¹}, at which the partial derivatives equal 0, are identical. Given that U^((kp)) is unitary, Lemma 1 implies that the extrema of Tr{S₄} occur when U^((kp)) is a diagonal unitary permutation matrix. After substituting in equation (16), the identity permutation U^((kp))=I_(kp) can be shown to maximize Tr{Λ_(d)Λ² _(t)S₃}.

Therefore, $\begin{matrix} {{\sigma_{l}^{2} \geq {{{Tr}\left\{ {\Lambda_{d}\Lambda_{t}} \right\}} - {\sum\limits_{i = 1}^{k_{p}}\quad\frac{\lambda_{t_{i}}\lambda_{d_{i}}}{1 + {\sigma_{w}^{2}\lambda_{p_{i}}^{- 1}\lambda_{t_{i}}^{- 1}}}}}} = {\sigma_{w}^{2}{\sum\limits_{i = 1}^{N_{t}}\quad\frac{\lambda_{d_{i}}\lambda_{t_{i}}}{\sigma_{w}^{2} + {\lambda_{t_{i}}{\lambda_{p_{i}}.}}}}}} & (18) \end{matrix}$

Finally, equality is verified by substituting U_(p)=U_(d)=U_(t) in Tr{Q_(d)(R_(t)−{tilde over (R)}_(t)}. Let the eigen decomposition of {tilde over (R)}_(t) be Ũ_(t) {tilde over (Λ)}_(t) Ũ_(t) ^(†).

The optimal pilot and data signal generation, according to the invention that maximizes C_(L) follows.

Theorem 2:

C_(L) satisfies an upper bound: $\begin{matrix} {{C_{L}\left( {\Lambda_{d},U_{d},X_{p}} \right)} \leq {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{{\overset{\sim}{H}}_{\omega}}\log_{2}{\quad\left| {I_{N_{t}} + \frac{{\overset{\sim}{H}}_{\omega}^{\dagger}{\overset{\sim}{H}}_{\omega}{\overset{\sim}{\Lambda}}_{t}\Lambda_{d}}{\sigma_{\omega}^{2} + {\sigma_{\omega}^{2}{\sum\limits_{i = 1}^{N_{t}}\quad\frac{\lambda_{t_{i}}\lambda_{d_{i}}}{\sigma_{\omega}^{2} + {\lambda_{p_{i}}\lambda_{t_{i}}}}}}}} \middle| . \right.}}} & (19) \end{matrix}$

Furthermore, the upper bound is achieved when U_(d)=U_(p)=U_(t)=Ũ_(t), and, therefore, constitutes an optimal solution.

Proof C_(L) is a function of Q_(d)=U_(d)Λ_(d)U^(†) _(d), and X_(p)=U_(p)Σ_(p)V^(†) _(p), which affects {tilde over (R)}_(t). Starting from equation (12), the following sequence of inequalities holds true. $\begin{matrix} {{{C_{L}\left( {\Lambda_{d},U_{d},X_{p}} \right)} = \left. {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}} \middle| {I_{N_{t}} + \frac{{\overset{\sim}{H}}_{\omega}^{\dagger}{\overset{\sim}{H}}_{\omega}{\overset{\sim}{R}}_{t}^{\frac{1}{2}}Q_{d}{\overset{\sim}{R}}_{t}^{1/2^{\dagger}}}{\sigma_{\omega}^{2} + {{Tr}\left\{ {Q_{d}\left( {R_{t} - {\overset{\sim}{R}}_{t}} \right)} \right\}}}} \right|},\quad\left. {\leq {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}}} \middle| {I_{N_{t}} + \frac{{\overset{\sim}{H}}_{\omega}^{\dagger}{\overset{\sim}{H}}_{\omega}{\overset{\sim}{R}}_{t}^{\frac{1}{2}}Q_{d}{\overset{\sim}{R}}_{t}^{1/2^{\dagger}}}{\sigma_{\omega}^{2} + {{\min\quad}_{U_{d},U_{p}}{Tr}\left\{ {Q_{d}\left( {R_{t} - {\overset{\sim}{R}}_{t}} \right)} \right\}}}} \right|,} & (20) \\ {\quad{= \left. {\left( {1 - \frac{T_{p}}{T}} \right){\mathbb{E}}_{\hat{H}}\log_{2}} \middle| {I_{N_{t}} + \frac{{\overset{\sim}{H}}_{\omega}^{\dagger}{\overset{\sim}{H}}_{\omega}{\overset{\sim}{R}}_{t}^{\frac{1}{2}}Q_{d}{\overset{\sim}{R}}_{t}^{1/2^{\dagger}}}{\sigma_{\omega}^{2} + {\sigma_{\omega}^{2}{\sum\limits_{i = 1}^{N_{t}}\quad\frac{\lambda_{d_{i}}\lambda_{t_{i}}}{\sigma_{\omega}^{2} + {\lambda_{p_{i}}\lambda_{t_{i}}}}}}}} \middle| . \right.}} & (21) \end{matrix}$

Equation (21) follows from Theorem 1. Remember that Tr{Q_(d)}=Tr{Λ_(d)}=P_(d). Given that the denominator is independent of U_(d), then, for the same data power P_(d), the formula for C_(L) in equation (21) is maximized, and thereby, upper bounded, by the case U_(d)=Ũ_(t). Substituting this in equation (21) leads to equation (19).

The last step is to verify that equality is achievable. This can be done by substituting U_(p)=U_(d)=U_(t)=Ũ_(t) in the formula for C_(L).

The proof for Theorem 2 obtains consecutive upper bounds by first minimizing the denominator and then independently maximizing the numerator. In general, the optimizing arguments responsible for the two optimizations need not be the same. However, we have shown above that the two optimizing arguments are indeed the same in our set up. After eigenspace matching, {tilde over (Λ)}_(t)=Λ_(t) ²Λ_(p)(Λ_(t)Λ_(p)+σ_(w) ² I _(T) _(p) )⁻¹.  (22)

We now investigate the rank properties of the optimal Q_(d) and Q_(p). Let k_(d) and k_(p) denote the ranks of Q_(d) and Q_(p), respectively.

Theorem 3:

The data signal and the pilot signal covariance matrices Q_(d) and Q_(p) are of the same rank to maximize the channel capacity C_(L).

Proof: The proof is in Appendix E.

The next theorem determines the optimal training duration.

Theorem 4:

The channel capacity C_(L) is maximized when T_(p)=k_(p)=k.

Proof: The proof is in Appendix F.

This implies that the optimal training duration T_(p), in terms of pilot symbols, can indeed be made less than N_(t) given CovKT. This duration is a function of the transmit eigenvalues Λ_(t), and the total power P. Moreover, given that k=k_(d)≦min(N_(t), N_(r)), the following is an important corollary for transmit diversity systems in which the number of receive antennas is less than the number of transmit antennas, i.e., N_(r)<N_(t).

Corollary 1:

T_(p)≦min(N_(t), N_(r)).

In summary, for the system under consideration, the data and pilot sequences satisfy the following properties:

-   -   (a) The eigenspaces U_(t)=U_(p)=U_(d)=Ũ_(t) all match; and     -   (b) The ranks match, i.e., rank(Q_(d))=rank(Q_(p))=k match, and     -   (c) The training duration, in units of symbol durations, need         only equal the rank k.

For a given rank k, the N_(t)−k eigenvectors of Q_(d) and Q_(p) corresponding to the zero eigenvalues are irrelevant.

The eigenvalues of the covariance matrices Q_(d) and Q_(p), namely Λ_(d) and Λ_(p), and thereby P_(d), P_(p), and k, depend on P, T, and Λ_(t), and are optimized numerically.

These conditions according to the invention, combined with a simple expressions for C_(L) and {tilde over (Λ)}_(t), drastically reduce the search space to determine all the optimal parameters, and make the numerical search feasible.

Sub-Optimal Embodiments

We now focus on the pilot and data loading (Λ_(p) and Λ_(d)) and show how their computation can be simplified considerably.

Pilot Signal Loading to Minimize Self-Interference σ_(l) ²

First, we first consider the power loading for the pilot signal that minimizes the self-interference noise term σ_(l) ². This results in a closed-form relationship between the loading for the data and pilot signals. As shown in Appendix G, the solution to a self-interference minimization problem min_(Λp) σ_(l) ², subject to the constraint Tr{Λ_(p)}=P_(p)T_(p) is $\begin{matrix} {{{\lambda_{pi} = \left( {{\mu\sqrt{\lambda_{di}}} - \frac{\sigma_{w}^{2}}{\lambda_{t_{i}}}} \right)^{+}},{1 \leq i \leq k}}{where}} & (23) \\ {\mu = \frac{{P_{p}T_{p}} + {\sigma_{w}^{2}{\sum\limits_{i = 1}^{k}\quad\lambda_{t_{i}}^{- 1}}}}{\sum\limits_{i = 1}^{k}\sqrt{\lambda_{di}}}} & (24) \end{matrix}$ and (.)⁺ denotes max(., 0).

Maximizing the denominator, without taking the numerator into account, need not maximize C_(L) because this ignores the dependence of {tilde over (Λ)}_(t) on Λ. However, the above interrelationship halves the number of unknowns and serves as a good starting point for the numerical optimization routines that determine the optimal pilot and data eigenvalues Λ_(p) and Λ_(d).

Minimizing σ_(l) ² with respect to Λ_(d) is not of interest as this results in a degenerate k=1 transmit diversity solution for all P_(d).

Uniform Selective Eigenmode Loading

We consider a scheme that allocates equal power to all the eigenmodes in use for data and pilot signals (symbols). The number of eigenmodes used and the ratio of powers allocated to pilots and data signals are numerically optimized. Note that the optimization is over two variables: 1≦k≦N_(t) and α, and is considerably simpler.

The capacity achieved by the uniform selective eigenmode loading scheme is within 0.1 bits/sec/Hz of the optimal C_(L) for all P and σ_(θ), and several N_(r) and N_(t) values. While this result is expected for higher P or when the eigenvalues of R_(t) are similar, the near-optimal performance for all P and σ_(θ) is not obvious. The answer lies in the loading of the data signal at the transition points when additional eigenmodes are turned on.

Effect of the Invention

The invention provides a method for determining the pilot and data signals in multiple-input, multiple-output communications systems where channel knowledge is imperfect at the receiver and partial channel knowledge, such as covariance knowledge, is available at the transmitter.

The invention also provides for power loading of the pilot and data signals. The invention exploits covariance knowledge at the transmitter to generate the pilot and data signals. The case where channel state information at the receiver is acquired using a pilot-aided MMSE channel estimation is described.

An optimal embodiment of the invention was considered. The invention uses an analytically tractable lower bound on the ergodic channel capacity, and shows that the lower bound is maximized when the eigenspaces of the covariance matrices of the pilot and data signals match the eigenspaces of the transmit covariance matrix R_(t). Furthermore, it is sufficient to transmit the data signals over only those eigenmodes of R_(t) that are allocated power during training.

Indeed, the optimal training duration can be less than the number of transmit antennas, and equal to the number of eigenmodes used for data transmission. For small angular spreads, our system with covariance knowledge and imperfect CSIR, outperforms prior art systems with perfect CSIR but without any covariance knowledge. The results obtained by the invention are in contrast to the results obtained without assuming any channel knowledge, even statistical, at the transmitter; then the optimal U_(p) was I_(Nt), and the optimal T_(p) is always N_(t).

For larger angular spreads, imperfect CSIR negates the benefits that accrue by using covariance knowledge. Uniform power loading over the eigenmodes used for data transmission and training achieves near-optimal performance for all values of interest of angular spread and power. This behavior is unlike the prior art case of perfect CSIR and perfect instantaneous CSIT, where conventional water-filling is optimal and markedly outperforms uniform power loading at low SNR for small angular spreads.

The invention provides an explicit relationship between pilot and data signal eigenmode power allocations to minimize self-interference noise due to imperfect estimation.

Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Appendices

A. Statistically Equivalent Representation of Ĥ

From (4) and the Kronecker model for H in (1), we have Ĥ=H_(w)R_(t) ^(1/2)X_(p)A+W_(p)A. Let ĥ_(i), r_(i), and w_(i) denote the i th rows of Ĥ, H_(w), and W_(p), respectively. They are related by ĥ _(i) =r _(i) R _(t) ^(1/2) X _(p) A+w _(i) A.  (25) Given that w_(i) and r_(i) are uncorrelated, the rows of Ĥ are uncorrelated:

_(ĥ) _(i) _(,ĥ) _(j) [ĥ_(i) ^(†)ĥ_(j)]=0, (i≠j).  (26)

When i=j, the correlation is given by $\begin{matrix} {{{{\mathbb{E}}_{{\hat{h}}_{i}}\left\lbrack {{\hat{h}}_{i}^{\dagger}{\hat{h}}_{i}} \right\rbrack} = {{A^{\dagger}X_{p}R_{t}^{1/2^{\dagger}}{{\mathbb{E}}_{r_{i}}\left\lbrack {r_{i}^{\dagger}r_{i}} \right\rbrack}R_{t}^{\frac{1}{2}}X_{p}A} + {A^{\dagger}{{\mathbb{E}}_{w_{i}}\left\lbrack {w_{i}^{\dagger}w_{i}} \right\rbrack}A}}},} & (27) \\ {\quad{{= {{A^{\dagger}\left( {{X_{p}^{\dagger}R_{t}X_{p}} + {\sigma_{w}^{2}I_{T_{p}}}} \right)}A}},}} & (28) \\ {\quad{{= {\overset{\sim}{R}}_{t}},{{for}\quad{all}\quad i}}} & (29) \end{matrix}$ Eqn. (28) follows from (27) because

_(r) _(i) [r_(i) ^(†)r_(i)]=I_(N) _(t) and

_(w) _(i) [w_(i) ^(†)w_(i)]=σ_(w) ²I_(Nt). Combining (26) and (29), yields the desired result. B. Formula for σ_(l) ²

The expression for σ_(l) ² can be simplified as follows: $\begin{matrix} {{\sigma_{l}^{2} = {\frac{1}{N_{r}}{Tr}\left\{ {{\mathbb{E}}_{x_{d \cdot \Delta}}\left\lbrack {\Delta\quad x_{d}x_{d}^{\dagger}\Delta^{\dagger}} \right\rbrack} \right\}}},\quad{= {\frac{1}{N_{r}}{Tr}\left\{ {{{\mathbb{E}}_{\Delta}\left\lbrack {\Delta^{\dagger}\Delta} \right\rbrack}Q_{d}} \right\}}},\quad{= {\frac{1}{N_{r}}{Tr}\left\{ {{{\mathbb{E}}_{H,\hat{H}}\left\lbrack {{H^{\dagger}H} - {{\hat{H}}^{\dagger}H}} \right\rbrack}Q_{d}} \right\}}},} & (30) \\ {\quad{{= {\frac{1}{N_{r}}{Tr}\left\{ {{R_{t}^{1/2^{\dagger}}{{\mathbb{E}}_{H_{w}}\left\lbrack {H_{w}^{\dagger}H_{w}} \right\rbrack}R_{t}^{\frac{1}{2}}} - {{{\mathbb{E}}_{H,\hat{H}}\left\lbrack {{\hat{H}}^{\dagger}H} \right\rbrack}Q_{d}}} \right\}}},}} & (31) \\ {\sigma_{l}^{2} = {\frac{1}{N_{r}}{Tr}{\left\{ {\left( {{N_{r}R_{t}} - {N_{r}A^{\dagger}X_{p}^{\dagger}R_{t}}} \right)Q_{d}} \right\}.}}} & (32) \end{matrix}$ Eqn. (30) follows from the orthogonality property of linear estimation error,

_(Δ,Ĥ)[Δ^(†)Ĥ]=0. Eqn. (31) simplifies because

_(H) _(w) [H_(w) ^(†H) _(w)]=N_(r)I_(N) _(t) The desired expression in (11) follows from the expression for {circumflex over (R)}_(t) derived in (7), and the fact that {circumflex over (R)}_(t) is Hermitian.

Appendices

C. Simplifying σ_(l) ²=Tr {Q_(d)(R_(t)−{tilde over (R)}₄)}

In terms of the SVD of X_(p)=U_(p)Σ_(p)V_(p) ^(†), {tilde over (R)}_(t) can be written as {tilde over (R)} _(t) =R _(t) U _(p)Σ_(p)(Σ_(p) ^(†) U _(p) ^(†) R _(t) U _(p) Σ _(p)+σ_(w) ² I _(T) _(p) )⁻¹Σ_(p) ^(†) U _(p) ^(†) R _(t).  (33) In general, the rank, k_(p), of Σ_(p) is less than N_(t). Therefore, ${\sum\limits_{p}{= \begin{bmatrix} \overset{({kp})}{\sum\limits_{p}} & 0 \\ 0 & 0 \end{bmatrix}}},$ where Σ_(p) ^((k) ^(p) ⁾ is invertible. Substituting this in (33) and then moving Σ_(p) ^((k) ^(p) ⁾ inside the inverse, yields ${\overset{\sim}{R}}_{t} = {R_{t}{U_{p}\begin{bmatrix} \left( {\left( {U_{p}^{\dagger}R_{t}U_{p}} \right)^{({kp})} + {\sigma_{w}^{2}\Lambda_{p}^{{({kp})}^{- 1}}}} \right)^{- 1} & 0 \\ 0 & 0 \end{bmatrix}}U_{p}^{\dagger}{R_{t}.}}$ Therefore, ${{Tr}\left\{ {Q_{d}\left( {R_{t} - {\overset{\sim}{R}}_{t}} \right)} \right\}} = {{Tr}{\left\{ {Q_{d}{R_{t}\left( {I_{N_{t}} - {{U_{p}\begin{bmatrix} \left( {\left( {U_{p}^{\dagger}R_{t}U_{p}} \right)^{({kp})} + {\sigma_{w}^{2}\Lambda_{p}^{{({kp})}^{- 1}}}} \right)^{- 1} & 0 \\ 0 & 0 \end{bmatrix}}U_{p}^{\dagger}R_{t}}} \right)}} \right\}.}}$ Expressing R_(t) in terms of its SVD, consolidating and rearranging terms, finally results in $\begin{matrix} {{\sigma_{l}^{2} = {{Tr}\left\{ {\Lambda_{d}V\quad{\Lambda_{t}\left( {I_{N_{t}} - {{U^{\dagger}\begin{bmatrix} \left( {\left( {U\quad{\Lambda\quad}_{t}U^{\dagger}} \right)^{({kp})} + {\sigma_{w}^{2}\Lambda_{p}^{{({kp})}^{- 1}}}} \right)^{- 1} & 0 \\ 0 & 0 \end{bmatrix}}U\quad{\Lambda\quad}_{t}}} \right)}V^{\dagger}} \right\}}},} & (34) \end{matrix}$ where U=U_(p) ^(†)U_(t) and V=U_(d) ^(†)U_(t). D. Simplifying Tr{Λ_(d)Λ_(t) ²S₃}

After block matrix multiplications, (UΛ_(t)U^(†))^((k) ^(p) ⁾=U^((k) ^(p) ⁾Λ_(t) ^((k) ^(p) ⁾U^((k) ^(p) ⁾ ^(†) +DΛ_(t) ^((rest))D^(†). Hence, Tr{Λ_(d)Λ_(t) ²S₃}=Tr{Λ_(d) ^((k) ^(p) ⁾ ² U^((k) ^(p) ⁾ ^(†) [U^((k) ^(p) ⁾Λ_(t) ^((k) ^(p) ⁾U^((k) ^(p)) ^(†) +DΛ_(t) ^((rest))D^(†)+σ_(w) ²Λ_(p) ^((k) ^(p) ⁾ ⁻¹ ]U^((k) ^(p) ⁾}. Moving U^((k) ^(p) ⁾ ^(†) , U^((k) ^(p) ⁾, and Λ_(t) ^((k) ^(p) ⁾ into the inverse⁷, we get Tr{Λ _(d)Λ_(t) ² S ₃ }=Tr{Λ _(d) ^((k) ^(p) ⁾Λ_(t) ^((k) ^(p) ⁾ [I _(k) _(p) σ_(w) ² U ^(k) ^(p) ⁾ ⁻¹ Λ_(p) ^((k) ^(p) ⁾ ⁻¹ U ^((k) ^(p) ^()†) ⁻¹ Λ_(t) ^((k) ^(p) ⁾ ⁻¹ +G] ⁻¹}, where G is positive semi-definite. Removing G cannot decrease the trace. Therefore, the desired eqns. (16) and (17) follow. ⁷U^((k) ^(p) ⁾ is invertible because U is unitary.

Appendices

E. Data and Pilot Rank Matching

Let k_(p)=rank(Λ_(p)) and k_(d)=rank(Λ_(d)). Let k min(k_(d), k_(p)) From (22), it can be seen that rank({tilde over (Λ)}_(t)Λ_(d))=k. Therefore, {tilde over (Λ)}_(t)Λ_(d) is of the form ${{\overset{\sim}{\Lambda}}_{t}\Lambda_{d}} = {\begin{bmatrix} {{\overset{\sim}{\Lambda}}_{t}^{(k)}\Lambda_{d}^{(k)}} & 0 \\ 0 & 0 \end{bmatrix}.}$

Given the eigenspace matching result from Thm. 2, C_(L) simplifies to $\begin{matrix} {C_{L} = {\left( {1 - \frac{T_{p}}{T}} \right)\quad E_{{\overset{\sim}{H}}_{w}}\log_{2}{{{I_{k} + \frac{\left( {{\overset{\sim}{H}}_{w}^{\dagger}{\overset{\sim}{H}}_{w}} \right){{}_{}^{(k)}\left. \Lambda \right.\sim_{}^{(k)}}\Lambda_{d}^{(k)}}{\begin{matrix} {\sigma_{w}^{2} + {{Tr}\quad\left\{ {\Lambda_{t}^{(k_{d})}\Lambda_{d}^{(k_{d})}} \right\}} -} \\ {{Tr}\quad\left\{ {{\overset{\sim}{\Lambda}}_{t}^{(k)}\Lambda_{d}^{(k)}} \right\}} \end{matrix}\quad}}}.}}} & (35) \end{matrix}$ The above equation implies that the N_(t)−k weakest eigen values of Λ_(p), namely, λ_(pk+1), . . . , λ_(PN) _(t) , play no role in the capacity expression. They must be set to 0 to conserve energy for the pilots for the modes in use. Hence, k_(p)≦k.

We now show that any scenario other than k=k_(p)=k_(d) is sub-optimal. If k_(p)>k_(d), then k=min(k_(p), k_(d))=k_(d). But, k_(p)≦k from the arguments above. Therefore, this case is impossible. If k_(d)>k_(p), k=k_(p). Allocating any power to the data eigenmodes λ_(d) _(k+1) , . . . , λ_(d) _(Nt) does not affect the numerator, ({tilde over (H)}_(w) ^(†){tilde over (H)}_(w)) ^((k)){tilde over (Λ)}_(t) ^((k))Λ_(d) ^((k)), in (35), while it increases the denominator (noise) term Tr{Λ_(t) ^((k) ^(p) ⁾Λ_(d) ^((k) ^(d) ⁾}. Hence, this case is also sub-optimal.

Appendices

F. Optimal Training Duration

From Thm. 3, we know that T_(p)≧k_(p)=k. Let a value of T, strictly greater than k be optimal, with data and pilot covariance matrices Λ_(d) ^(p) and Λ_(p) ^(o), respectively.⁸ ⁸Setting V_(p)=I_(T) _(p) , does affect {tilde over (R)}_(t) and C_(L) and shows that having T_(p)>k is equivalent to not transmitting any pilot power in the last T_(p)−k slots allocated for training. The proof shows that this is sub-optimal.

Now consider the case where the pilots are transmitted over just T_(p)−1 time instants with the same pilot covariance matrix Λ_(p)=Λ_(p) ^(o), while the data is now transmitted for one more time instant. To satisfy the total energy constraint, the new data covariance matrix is set to Λ_(d)=βΛ_(d) ^(o), where $\beta = {\frac{T - T_{p}}{T - T_{p} + 1} < 1.}$ While the data is now transmitted for a longer duration, the rate achieved per transmission is reduced due to lower power. We now show that, for a given data power P_(d) used when the training time was T_(p), the difference between the two capacities, f(P_(d))=T[C(T_(p)−1)−C(T_(p))], is positive. f(P_(d)) can be written as $\begin{matrix} \begin{matrix} {{f\quad\left( P_{d} \right)} = {{\left( {T - T_{p} + 1} \right)\quad{E_{D}\left\lbrack {\log_{2}{{I_{N_{t}} + \frac{P_{d}\beta\quad D}{\sigma_{w}^{2} + {P_{d}{\beta\delta}_{1}}}}}} \right\rbrack}} -}} \\ {{\left( {T - T_{p}} \right)\quad{E_{D}\left\lbrack {\log_{2}{{I_{N_{t}} + \frac{P_{d}\quad D}{\sigma_{w}^{2} + {P_{d}\delta_{1}}}}}} \right\rbrack}},} \end{matrix} & (36) \end{matrix}$ where D={tilde over (H)}_(w) ^(†){tilde over (H)}_(w){tilde over (Λ)}_(t) ^(o){overscore (Λ)}_(d) ^(o) and δ₁=Σ_(i=1) ^(N) ^(t) (λ_(t) _(i) {tilde over (λ)}_(t) _(i) ){overscore (λ)}_(d) _(i) ^(o) >0. Here, ${\overset{\_}{\Lambda}}_{d}^{\quad o} = {\frac{1}{P_{d}}\Lambda_{d}^{o}}$ denotes the power normalized Λ_(d) ^(o) and is independent of P_(d); {overscore (λ)}_(d) _(i) ^(o) is its ith diagonal element.

We first show that $\frac{\mathbb{d}f}{\mathbb{d}P_{d}} > 0.$ The derivative of the determinant of an arbitrary matrix M is given by $\frac{\mathbb{d}{M}}{\mathbb{d}x} = {{M}{Tr}\quad{\left\{ {M^{- 1}\frac{\mathbb{d}M}{\mathbb{d}x}} \right\}.}}$ It can then be shown that ${\frac{\mathbb{d}f}{\mathbb{d}P_{d}} = {{{E_{D}\left\lbrack {{Tr}\quad\left\{ {\left( {T - T_{p} + 1} \right)\left( {I_{N_{t}} + {\frac{P_{d}\beta}{\sigma_{w}^{2} + {\beta\quad P_{d}\delta_{1}}}D}} \right)^{- 1}D} \right\}} \right\rbrack}\frac{{\beta\sigma}_{w}^{2}}{\ln\quad(2)\left( {\sigma_{w}^{2} + {\beta\quad P_{d}\delta_{1}}} \right)^{2}}} - {{E_{D}\left\lbrack {{Tr}\quad\left\{ {\left( {T - T_{p}} \right)\left( {I_{N_{t}} + {\frac{P_{d}}{\sigma_{w}^{2} + {P_{d}\delta_{1}}}D}} \right)^{- 1}D} \right\}} \right\rbrack}\frac{\sigma_{w}^{2}}{\ln\quad(2)\left( {\sigma_{w}^{2} + {P_{d}\delta_{1}}} \right)^{2}}}}},{> {\frac{\sigma_{w}^{2}\left( {T - T_{p}} \right)}{\ln\quad(2)\left( {\sigma_{w}^{2} + {P_{d}\delta_{1}}} \right)^{2}}{E_{D}\left\lbrack {{Tr}\quad\left\{ {\left( {I_{N_{t}} + {\frac{\beta\quad P_{d}}{\sigma_{w}^{2} + {P_{d}\beta\quad\delta_{1}}}D}} \right)^{- 1} - \left( {I_{N_{t}} + {\frac{P_{d}}{\sigma_{w}^{2} + {P_{d}\quad\delta_{1}}}D}} \right)^{- 1}} \right\}} \right\rbrack}}}$ The last step follows because $\frac{\left( {\sigma_{w}^{2} + {P_{d}\quad\delta_{1}}} \right)^{2}}{\left( {\sigma_{w}^{2} + {\beta\quad P_{d}\quad\delta_{1}}} \right)^{2}} > {1\quad{if}\quad\beta} < 1.$ Using the relation ${{{Tr}\quad\left\{ {\left( {I_{N_{t}} + {qD}} \right)^{- 1}D} \right\}} = {\sum\limits_{i = 1}^{N_{t}}\quad\frac{\lambda_{D_{i}}}{1 + {q\quad\lambda_{D_{i}}}}}},\quad\left( {q \geq 0} \right),$ and simplifying gives $\begin{matrix} {{\frac{\mathbb{d}f}{\mathbb{d}P_{d}} > {\frac{\sigma_{w}^{2}\left( {T - T_{p}} \right)}{\ln\quad(2)\left( {\sigma_{w}^{2} + {P_{d}\delta_{1}}} \right)^{2}}{E_{D}\left\lbrack {\sum\limits_{i = 1}^{N_{t}}\quad\frac{\lambda_{D_{i}}^{2}\left( {{\alpha\quad(1)} - {\alpha\quad(\beta)}} \right)}{\begin{matrix} \left( {1 + {\alpha\quad(\beta)\quad\lambda_{D_{i}}}} \right) \\ \left( {1 + {\alpha\quad(1)\quad\lambda_{D_{i}}}} \right) \end{matrix}}} \right\rbrack}}},} & (37) \end{matrix}$ where α(β)=βP_(d)/(σ_(w) ²+βP_(d)δ₁).

Given that α(β)<α(1)<1, each of the terms in (37) is positive. We therefore get $\frac{\mathbb{d}f}{\mathbb{d}P_{d}} > 0.$ Notice that as P_(d)→0, lim_(P) _(d−0) of f(P_(d))=0. This along with $\frac{\mathbb{d}f}{\mathbb{d}P_{d}} > 0$ implies that f(P_(d))>0. This shows that any T_(p)>k is necessarily sub-optimal.

Appendices

G. Λ_(d) and Λ_(p) Relationship for Minimizing σ_(l) ²

From Thm. 1, we know that $\begin{matrix} {{\min\limits_{U_{p},U_{d}}\sigma_{l}^{2}} = {\sigma_{w}^{2}{\sum\limits_{i = 1}^{k}{\frac{\lambda_{d_{i}}\lambda_{t_{i}}}{\sigma_{w}^{2} + {\lambda_{t_{i}}\lambda_{p_{i}}}}.}}}} & (38) \end{matrix}$ Minimizing the above formula with respect to λ_(p1), . . . , λ_(pk), subject to the trace constraint Σ_(i=1) ^(k) λ_(pi)=P_(p)T_(p), is equivalent to maximizing the Lagrangian $\begin{matrix} {{g = {{\sigma_{w}^{2}{\sum\limits_{i = 1}^{k}\frac{\lambda_{d_{i}}\lambda_{t_{i}}}{\sigma_{w}^{2} + {\lambda_{t_{i}}\lambda_{p_{i}}}}}} + {\delta\left( {{\sum\limits_{i = 1}^{k}\lambda_{p_{i}}} - {P_{p}T_{p}}} \right)}}},} & (39) \end{matrix}$ where δ is the Lagrange multiplier. Solving for $\frac{\partial g}{\partial\lambda_{p_{j}}} = 0$ results in (23). Substituting (23) in the trace constraint gives (24). 

1. A method for generating signals in a transmitter of a multiple-input, multiple-output wireless communications system, the transmitter including N_(t) transmit antennas, comprising: determining a transmit covariance matrix R_(t) based on statistical state information of a channel; decomposing the transmit covariance R_(t) matrix using transmit eigenvalues Λ_(t) to obtain a transmit eigenspace U_(t) according to R_(t)=U_(t)Λ_(t)U^(†) _(t), where † is a Hermitian transpose; setting a pilot eigenspace U_(p) equal to the transmit eigenspace U_(t); and generating a N_(t)×T_(p) block of pilot symbols X_(p) from the pilot eigenspace U_(p) and pilot eigenvalue Λ_(p) according to X_(p)=U_(p)Λ_(p) ^(1/2).
 2. The method of claim 1, in which the pilot eigenvalues are based strictly on a signal duration T and a power P allocated to transmitted signals.
 3. The method of claim 1, in which the block of pilot symbols X_(p) has an arbitrary right eigenspace V_(p), thereby taking a general form X_(p)=U_(p)Λ_(p) ^(1/2)V_(p) ^(†).
 4. The method of claim 1, further comprising: setting a data eigenspace U_(d) equal to the transmit eigenspace U_(t); generating a N_(t)×N_(t) data covariance matrix Q_(d) according to U_(d)Λ_(d)U^(†) _(d), where Λ_(d) are data eigenvalues; and generating a N_(t)×T_(d) block of data symbols, such that an average covariance of each of the columns in the block of data symbols X_(d) equals the data covariance matrix Q_(d).
 5. The method of claim 4, further comprising: combining the block of pilot symbols and the block of data symbols, such that X=[X_(p), X_(d)]; and transmitting each of N_(t) rows of the matrix X to a different one of the N_(t) antennas.
 6. The method of claim 4, further comprising: setting rank of a pilot covariance matrix Q_(p) equal to a rank of the data covariance matrix Q_(d) to maximize a capacity of the channel.
 7. The method of claim 6, in which the channel capacity is maximized when a number of pilot signals T_(p) is equal to the rank.
 8. The method of claim 7, in which T_(p)≧min(N_(t), N_(r)), where N_(r) is a number of receive antennas.
 9. The method of claim 2, further comprising: allocating equally power to all eigenmodes used for the data symbols and the pilot symbols.
 10. The method of claim 9, in which a number of eigenmodes used and a ratio of power allocated to the pilot symbols and the data symbols are optimized numerically. 